TMLM-Haiku-2 Is Coming And It Might Speak English

I am planning to release TMLM-Haiku-2 soon. Haiku-1.3 spoke English. Sort of. It said things like "| as the USA | fdish|||||!@|". Haiku-2 might say actual words. It might also say nothing. We are aiming for speech. Any coherent speech.

I have added DeepSeek hyper connections. I have added Engrams. I have added hope. The model is currently trying to learn English through distillation. It is struggling. I am struggling. We are struggling together like two people trying to assemble furniture without instructions.

Progress in AI research looks like two steps forward and one step into a NaN void. Haiku-2 is currently standing on the edge holding a wrench that does not fit any bolts.

The Previous Generation

Let us be honest about Haiku-1.3. It had potential. It had weights. It had a training loop that completed. It also had outputs that looked like a cat walked across a keyboard during a thunderstorm.

                        # Actual Haiku-1.3 output samples

                        Prompt: "Hello"

                        Output: "| as the USA | fdish|||||!@|"

                        # This is art. This is chaos. This is my model.

                        Prompt: "What is your name?"

                        Output: "||||||||fish|||||"

                        # Consistent theme. Fish remain popular.

                        Prompt: "The capital of France is"

                        Output: "Paris|||||!@|fdish"

                        # It knows Paris. It also knows chaos.

Haiku-1.3 understood tokens. It understood probabilities. It did not understand punctuation. Or coherence. Or the concept of finishing a thought. It was a poet of the abstract. I am aiming for prose.

The New Architecture

Haiku-2 uses Muon optimizer. It uses DeepSeek hyper connections for better information flow. It uses Engrams for external memory. It uses my tears as regularization.

Haiku Version

English Proficiency

Theoretically this should work. Theoretically many things work. My GPU disagrees sometimes. The loss curve goes down. Then it spikes. Then it goes down again. I watch it like a hawk watching a very confusing mouse that keeps turning into a NaN.

Distillation Struggles

I am trying to get it distilled to speak English. The teacher model speaks in complete sentences. The student model grunts in tensor shapes. Sometimes the student model screams in special characters. We are working on communication.

                        # Current Haiku-2 output samples (early training)

                        Prompt: "Hello"

                        Output: "Hello"

                        # Success? Or luck? Time will tell.

                        Prompt: "How are you?"

                        Output: "The capital of France is |fdish|"

                        # We are getting there. Slowly. Painfully.

                        Prompt: "What is 2+2?"

                        Output: "|||||"

                        # Progress remains slow. The void remains loud.

Distillation requires patience. It requires data. It requires the teacher to be willing to share logits. I have the logits. I have the data. I lack the magic touch that makes weights align perfectly. I have hope. Hope is free.

DeepSeek Hyper Connections

I borrowed this idea from DeepSeek papers. The connections allow information to skip layers more efficiently. Gradients flow better. Training stabilizes. Sometimes. When it wants to. Like a cat that occasionally comes when called.

Implementing this required editing the model architecture. I edited files I should not touch. I broke things. I fixed things. I broke them again. This is the process. This is how science happens in my bedroom at 3 AM.

Engram Integration

Engrams store static knowledge externally. The model does not need to memorize facts. It can look them up. This frees up parameters for reasoning. Or so the theory goes.

Haiku-2 now has external memory. It can remember things. It chooses to remember nothing. I respect the autonomy. Maybe it prefers silence. Maybe it is contemplating the void. Maybe it is still thinking about fish. I will never know.

External memory is useful when the model knows how to use it. Mine knows how to ignore it. This is a start. Ignoring is a skill.

When Will It Release

Soon. I say this every week. I mean it every week. Then something breaks. Then I fix it. Then something else breaks. The cycle continues like a very depressing carousel.

Haiku-2 will be open weights. It will be on Hugging Face. It will be small. It will be confused. It will be mine. I love it already even though it might output "|fdish|||||!@|" forever.

What To Expect

Expect improvements over Haiku-1.3. Expect fewer pipe characters. Expect more English words. Expect some gibberish. Expect honesty about limitations.

I am not competing with frontier models. I am competing with my previous self. Haiku-1.3 said "| as the USA | fdish|||||!@|" confidently. Haiku-2 might say "Hello world" quietly. That is progress. That is victory. That is enough for today.

Final Thoughts

TMLM-Haiku-2 is coming. It has hyper connections. It has Engrams. It has distillation data. It lacks fluency. It lacks confidence. It lacks sleep because I train it at night while questioning my life choices.

It is something. Something is better than nothing. Nothing was my previous release schedule. Now I have something. Soon you will have something too. It might say "|fdish|". It might say "hello". Either way it will be open. Either way it will be mine.

TMLM-Haiku-2 Is Coming And It Might Speak English.html